BnVec: Towards the Development of Word Embedding for Bangla Language Processing

نویسندگان

چکیده

Progression in machine learning and statistical inference are facilitating the advancement of domains like computer vision, natural language processing (NLP), automation & robotics, so on. Among different persuasive improvements NLP, word embedding is one most used revolutionary techniques. In this paper, we manifest an open-source library for Bangla extraction systems named BnVec which expects to furnish NLP research community by utilization some incredible The splitted up into two parts, first suitable defined class embed words with access six popular schemes (CountVectorizer, TF-IDF, Hash Vectorizer, Word2vec, fastText, Glove). other based on pre-trained distributed system GloVe. models have been built collecting content from newspaper, social media, wiki articles. total number tokens build exceeds 395,289,960. paper additionally depicts performance these various hyper-parameter tuning then analyzes results.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Word Prediction in Bangla Language Using Stochastic Language Models

Word completion and word prediction are two important phenomena in typing that benefit users who type using keyboard or other similar devices. They can have profound impact on the typing of disable people. Our work is based on word prediction on Bangla sentence by using stochastic, i.e. N-gram language model such as unigram, bigram, trigram, deleted Interpolation and backoff models for auto com...

متن کامل

Towards Accurate Handwritten Word Recognition for Hindi and Bangla

Building accurate lexicon free handwritten text recognizers for Indic languages is a challenging task, mostly due to the inherent complexities in Indic scripts in addition to the cursive nature of handwriting. In this work, we demonstrate an end-to-end trainable CNN-RNN hybrid architecture which takes inspirations from recent advances of using residual blocks for training convolutional layers, ...

متن کامل

learners’ attitudes toward the effectiveness of mobile-assisted language learning (mall) in vocabulary acquisition in the iranian efl context: the case of word lists, audiobooks and dictionary use

رشد انفجاری تکنولوژی فرصت های آموزشی مهیج و جدیدی را پیش روی فراگیران و آموزش دهندگان گذاشته است. امروزه معلمان برای اینکه در امر آموزش زبان بروز باشند باید روش هایی را اتخاذ نمایند که درآن ها از تکنولوژی جهت کمک در یادگیری زبان دوم و چندم استفاده شده باشد. با در نظر گرفتن تحولاتی که رشته ی آموزش زبان در حال رخ دادن است هم اکنون زمان مناسبی برای ارزشیابی نگرش های موجود نسبت به تکنولوژی های جدید...

15 صفحه اول

Language classification from bilingual word embedding graphs

We study the role of the second language in bilingual word embeddings in monolingual semantic evaluation tasks. We find strongly and weakly positive correlations between down-stream task performance and second language similarity to the target language. Additionally, we show how bilingual word embeddings can be employed for the task of semantic language classification and that joint semantic sp...

متن کامل

the impact of morphological awareness on the vocabulary development of the iranian efl students

this study investigated the impact of explicit instruction of morphemic analysis and synthesis on the vocabulary development of the students. the participants were 90 junior high school students divided into two experimental groups and one control group. morphological awareness techniques (analysis/synthesis) and conventional techniques were used to teach vocabulary in the experimental groups a...

15 صفحه اول

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International journal of engineering & technology

سال: 2021

ISSN: ['2227-524X']

DOI: https://doi.org/10.14419/ijet.v10i2.31538